Abstract: Embedded systems often use specialized hardware accelerators to improve performance and reduce energy consumption especially in areas such as signal processing and video processing, communications, and computer vision. Hardware acceleration has been proved as an extremely promising implementation strategy for the digital signal processing (DSP) domain. An accelerator is a hardware module that can be attached to a processor core. It enhances the performance or functionality by executing certain function in the accelerator instead of executing in the processor core. The accelerator module mainly consists of flexible computational units(FCUs). The structure of the flexible computational unit is designed to enable high performance flexible operation chaining based on a set of operation templates found in DSP kernels. The number of flexible computational units is determined at the design time based on the instruction level parallelism and area constraints imposed by the designer. In this work a high performance architectural scheme is designed by combining both the architectural and arithmetic levels of abstraction. The proposed solution forms an efficient design tradeoff of 46% delivering optimized latency/area and energy implementations. It also provides high computing performance, real time processing and power efficiency to variety of applications ranging from sensors to servers. The accelerator module find wide applications in areas such as video encoding and decoding and in several image processing applications where high performance computation is needed.

Keywords: DSP Accelerator, Flexible computational unit, Digital signal processor, Modified Booth, Multiplier.